Performance of Inverted Indices in Shared - Nothing

نویسندگان

  • Anthony Tomasic
  • Hector Garcia-Molina
چکیده

The performance of distributed text document retrieval systems is strongly innuenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new prob-abilistic model of the database and queries. Simulation experiments determine which variables most strongly in-uence response time and throughput. This leads to a set of design trade-oos over a range of hardware conng-urations and new parallel query processing strategies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effect of Inverted Index Partitioning Schemes on Performance of Query Processing in Parallel Text Retrieval Systems

Shared-nothing, parallel text retrieval systems require an inverted index, representing a document collection, to be partitioned among a number of processors. In general, the index can be partitioned based on either the terms or documents in the collection, and the way the partitioning is done greatly affects the query processing performance of the parallel system. In this work, we investigate ...

متن کامل

Efficient Query Processing on Term-Based-Partitioned Inverted Indexes

In a shared-nothing, parallel text retrieval system, queries are processed over an inverted index that is partitioned among a number of index servers. In practice, the inverted index is either document-based or term-based partitioned, depending on properties of the underlying hardware infrastructure, query traffic, and some performance and availability constraints. In query processing on term-b...

متن کامل

On the Parallel Implementation of Sparse Matrix Information Retrieval Engine

We demonstrate a parallel implementation of a sparse matrix information retrieval engine. We use a shared nothing PC cluster. We perform our experiments with TREC disk 4 and 5 data, a NIST 2 Gigabytes standard benchmark text collection on 2, 4, 6, 8, 10, 12 and 14 processing nodes with different queries. We compare the results with the results of sequential inverted index, a conventional and co...

متن کامل

Caching and Database Scaling in Distributed Shared-Nothing Information Retrieval Systems

A common class of existing information retrieval system provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted inde...

متن کامل

Scaling in Distributed Shared - Nothing

A common class of existing information retrieval system provides access to abstracts. For example Stanford University , through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted ind...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993